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Abstract 


We show that several online combinatorial optimization problems that admit efficient no-regret algo¬ 
rithms become computationally hard in the sleeping setting where a subset of actions becomes unavailable 
in each round. Specifically, we show that the sleeping versions of these problems are at least as hard 
as PAC learning DNF expressions, a long standing open problem. We show hardness for the sleeping 
versions of Online Shortest Paths, Online Minimum Spanning Tree, Online ^-Subsets, On¬ 
line /c-Truncated Permutations, Online Minimum Cut, and Online Bipartite Matching. The 
hardness result for the sleeping version of the Online Shortest Paths problem resolves an open problem 
presented at COLT 2015 [Koolen et al. 2015 . 


1 Introduction 


Online learning is a sequential decision-making problem where learner repeatedly chooses an action in re¬ 
sponse to adversarially chosen losses for the available actions. The goal of the learner is to minimize the 
regret, defined as the difference between the total loss of the algorithm and the loss of the best fixed action 
in hindsight. In online combinatorial optimization, the actions are subsets of a ground set of elements (also 
called components) with some combinatorial structure. The loss of an action is the sum of the losses of 
its elements. A particular well-studied instance is the Online Shortest Path problem [Takimoto and 


Warmuthj 2003] on a graph, in which the actions are the paths between two fixed vertices and the elements 


are the edges. 

We study a sleeping variant of online combinatorial optimization where the adversary not only chooses 
losses but availability of the elements every round. The unavailable elements are called sleeping or sabotaged. 
In Online Sabotaged Shortest Path problem, for example, the adversary specifies unavailable edges 
every round, and consequently the learner cannot choose any path using those edges. A straightforward 


application of the sleeping experts algorithm proposed by Freund et al. 1997 gives a no-regret learner, but 


it takes exponential time (in the input graph size) every round. The design of a computationally efficient 
no-regret algorithm for Online Sabotaged Shortest Path problem was presented as an open problem 


at COLT 2015 by Koolen et al. 2015 


In this paper, we resolve this open problem and prove that Online Sabotaged Shortest Path problem 
is computationally hard. Specifically, we show that a polynomial-time low-regret algorithm for this problem 
implies a polynomial-time algorithm for PAC learning DNF expressions, which is a long-standing open 
problem. The best known algorithm for PAC learning DNF expressions on n variables has time complexity 
20 (- 
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Klivans and Servedio 


2001 


Our reduction framework (Section in fact shows a general result that any online sleeping combinatorial 
optimization problem with two simple structural properties is as hard as PAC learning DNF expressions. 
Leveraging this result, we obtain hardness results for the sleeping variant of well-studied online combinatorial 
optimization problems for which a polynomial-time no-regret algorithm exists: Online Minimum Spanning 
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Tree, Online fc-SuBSETS, Online /c-Truncated Permutations, Online Minimum Cut, and Online 
Bipartite Matching (Section [^. 

Our hardness result applies to the worst-case adversary as well as a stochastic adversary, who draws an 
i.i.d. sample every round from a fixed (but unknown to the learner) joint distribution over availabilities and 
losses. This implies that no-regret algorithms would require even stronger restrictions on the adversary. 


1.1 Related Work 


Online Combinatorial Optimization. The standard problem of online linear optimization with d actions 
(Experts setting) admits algorithms with 0{d) running time per round and 0{y/T log d) regret after T 
rounds [Littlestone and Warmuth , 1994, Freund and Schapire 1997 , which is minimax optimal [Cesa-Bianchi 
and Lugosi, 2006[ Chapter 2]. A naive application of such algorithms to online combinatorial optimization 
problem (precise definitions to be given momentarily) over a ground set of d elements will result in exp(0(d)) 
running time per round and 0{\/Td) regret. 

Despite this, many online combinatorial optimization problems, such as the one s considered in this paper, 


admit algorithms witt{^poly(d) running time per round and 0(poly(d)-\/T') regret 


Takimoto and Warmuth 

2003, Kalai and Vempala 2005[ Koolen et al. 2010 Audibert et al.[ |2013| . In fact, Kalai and Vempala 


2005 shows that the existence of a polynomial-time algorithm for an offline combinatorial problem implies 


the existence of an algorithm for the corresponding online optimization problem with the same per-round 
running time and 0(poly(d)-\/T) regret. 


Online Sleeping Optimization. In studying online sleeping optimization, three different notions of regret 
have been used: (a) policy regret, (b) ranking regret, and (c) per-action regret, in decreasing order of 
computational hardness to achieve no-regret. Policy regret is the total difference between the loss of the 
algorithm and the loss of the best policy, which maps a set of available actions and the observed loss sequence 
to an available action Neu and Valko 2014 . Ranking regret is the total difference between the loss of the 


algorithm and the loss of the best ranking of actions, which corresponds to a policy that chooses in each 
round the highest-ranked available action Kleinberg et al. 2010, Kanade and Steinke 2014 Kanade et al. 


2009 . Per-action regret is the difference between the loss of the algorithm and the loss of an action, summed 

Note that 


1997, Koolen et al. 2015 


over only the rounds in which the action is available Freund et al. 
policy regret upper bounds ranking regret, and while ranking regret and per-action regret are generally 
incomparable, per-action regret is usually the smallest of the three notions. 

The sleeping Experts (also known as Specialists) setting has been extensively studied in the literature 
Freund et al. 1997, Kanade and Steinke[ 2014|. In this paper we focus on the more general online sleeping 


combinatorial optimization problem, and in particular, the per-action notion of regret. 

A summary of known results for online sleeping optimization problems is given in Figure Note in 
particular that an efficient algorithm was known for minimizing per-action regret in the sleeping Experts 


problem Freund et al. 1997 . We show in this paper that a similar efficient algorithm for minimizing per- 


action regret in online sleeping combinatorial optimization problems cannot exist, unless there is an efficient 
algorithm for learning DNFs. Our reduction technique is closely related to that of [Kanade and Steinke] 


2014 , who reduced agnostic learning of disjunctions to ranking regret minimization in the sleeping Experts 


setting. 


2 Preliminaries 

An instance of online combinatorial optimization is defined by a ground set U oi d elements, and a decision 
set T> of actions, each of which is a subset of U. In each round t, the online learner is required to choose an 
action Vt G V, while simultaneously an adversary chooses a loss function : t/ —>■ [—1,1]. The loss of any 

^In this paper, we use the poly(-) notation to indicate a polynomially bounded function of the arguments. 
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Regret notion 

Bonnd 

Sleeping Experts 

Sleeping Combinatorial 

Opt. 

Policy 

Upper 

log d), under ILA 

0(poly(d)-\/T'), under ILA 

Kanade et al. 

2009 

|Neu and Valko 2014 

Abbasi- 


Yadkori et al. 2013 

Lower 


U(poly(d)'J'^“‘'), under SLA 

Abbasi-Yadkori et al. 

2013 

Ranking 

Lower 

Ll{poly{d)ffi ^), under SLA 

D(exp(D(d))-\/T'), under SLA 
[Easy construction, omitted] 

Kanade and Steinke| 2014 

Per-action 

Upper 

0{y/T log d), adversarial set¬ 
ting 


Freund et al. 

1997 

Lower 


D(poly(d)T^“‘’), under SLA 

[This paper] 


Figure 1: Summary of known results. Stochastic Losses and Availabilities (SLA) assumption is where adversary 
chooses a joint distribution over loss and availability before the first round, and takes an i.i.d. sample every round. 
Independent Losses and Availabilities (ILA) assumption is where adversary chooses losses and availabilities indepen¬ 
dently of each other (one of the two may be adversarially chosen; the other one is then chosen i.i.d in each round). 
Policy regret upper bounds ranking regret which in turn upper bounds per-action regret for the problems of interest; 
hence some bounds shown in some cells of the table carry over to other cells by implication and are not shown for 
clarity. The lower bound on ranking regret in online sleeping combinatorial optimization is unconditional and holds 
for any algorithm, efficient or not. All other lower bounds are computational, i.e. for polynomial time algorithms, 
assuming intractability of certain well-studied learning problems, such as learning DNFs or learning noisy parities. 


V € V is given by (with some abuse of notation) 

■■= Eeev^tie). 

The learner suffers loss ^t(Vt) and obtains it as feedback. The regret of the learner with respect to an action 

V GV is defined to be 

Regretr(V^) := ELiW)-^tiV). 

We say that an online optimization algorithm has a regret bound of f{d,T) if Regret 2 -(F) < f{d,T) for 
all V GT>. We say that the algorithm has no regret if f{d,T) — po\y{d)T^~^ for some S £ (0,1), and it is 
computationally efficient if it has a per-round running time of order poly{d,T). 

We now define an instance of the online sleeping combinatorial optimization. In this setting, at the start 
of each round t, the adversary selects a set of sleeping elements St and reveals it to the learner. Define 
At = {V G V \ V f] St = 0}, the set of awake actions at round t; the remaining actions in V, called sleeping 
actions, are unavailable to the learner for that round. If At is empty, i.e., there are no awake actions, then 
the learner is not required to do anything for that round and the round is discarded from computation of 
the regret. 

For the rest of the paper, unless noted otherwise, we use per-action regret as our performance measure. 
Per-action regret with respect to P £ D is defined as: 

RegreV(P) := ^ it{Vt) - itiV). (1) 

t-. veAt 

In other words, our notion of regret considers only the rounds in which V is awake. 

For clarity, we define an online combinatorial optimization problem as a family of instances of online 
combinatorial optimization (and correspondingly for online sleeping combinatorial optimization). For exam¬ 
ple, Online Shortest Path problem is the family of all instances of all graphs with designated source and 
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sink vertices, where the decision set I? is a set of paths from the source to sink, and the elements are edges 
of the graph. 

Our main result is that many natural online sleeping combinatorial optimization problems are unlikely 
to admit a computationally efficient no-regret algorithm, although their non-sleeping versions (i.e.. At ='D 
for all t) do. More precisely, we show that these online sleeping combinatorial optimization problems are at 
least as hard as PAG learning DNF expressions, a long-standing open problem. 


3 Online Agnostic Learning of Disjunctions 


Instead of directly reducing PAG learning DNF expressions to no-regret learning for online sleeping com¬ 
binatorial optimization problems, we use an intermediate problem, online agnostic learning of disjunctions. 
By a standard online-to-batch conversion argument Kanade and Steinke 2014 , online agnostic learning 


of disjunctions is at least as hard as agnostic improper PAG-learning of disjunctions [Kearns et al.[ |1994| 


which in turn is at least as hard as PAG-learning of DNF expressions Kalai et ah] 2012] . The online-to-batch 


conversion argument allows us to assume the stochastic adversary (i.i.d. input sequence) for online agnostic 
learning of disjunctions, which in turn implies that our reduction applies to online sleeping combinatorial 
optimization with a stochastic adversary. 

Online agnostic learning of disjunctions is a repeated game between the adversary and a learning algo¬ 
rithm. Let n denote the number of variables in the disjunction. In each round t, the adversary chooses a 
vector X( G {0,1}", the algorithm predicts a label yt G {0,1} and then the adversary reveals the correct 
label yt G {0,1}. If yt ^ yt, we say that algorithm makes an error. 

For any predictor (j) : {0,1}" — )■ {0,1}, we define the regret with respect to (fi after T rounds as 


Regrety(<))) = Vt] - l[0(xt) ^ y^- 


Our goal is to design an algorithm that is competitive with any disjunction, i.e. for any disjunction (/> over 
n variables, the regret is bounded by poly(n) • for some 6 G (0,1). Recall that a disjunction over n 

variables is a boolean function cj) : {0,1}" —>■ {0,1} that on an input x = (a:(l), a;(2),..., x{n)) outputs 

= (V ^ 

\*GP / 



where P and N are disjoint subsets of {1,2,..., n}. We allow either P or N to be empty, and the empty 
disjunction is interpreted as the constant 0 function. For any index i G {1, 2,..., n}, we call it a relevant 
index for </> if * S P U IV and irrelevant index for (j) otherwise. For any relevant index i, we call it positive if 
i G P and negative li i G N. 


4 General Hardness Result 

In this section, we identify two combinatorial properties of online sleeping combinatorial optimization prob¬ 
lems that are computationally hard. 

Definition 1. Let n be a positive integer. Consider an instance of online sleeping combinatorial optimization 
where the ground set U has d elements with 3n + 2 < d < poly{n). This instance is called a hard instance 
with parameter n, if there exists a subset Us C U of size 3n -I- 2 and a bijection between Us and the set 
(i.e., labeling of elements in Us by the set) 

n 

U{(i,0),(z,l),(f,*)}u{0,l}, 

i=l 

such that the decision set P satisfies the following properties: 
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Algorithm 1 Algorithm AlGdisj for learning disjunctions 

Require: An algorithm for the online sleeping combinatorial optimization problem, and the input 

size n for the disjunction learning problem. 

1; Construct a hard instance (17,2b) with parameter n of the online sleeping combinatorial optimization 
problem, and run on it. 

2; for t = 1, 2,..., T do 
3 : Receive X( e {0,1}”. 

4: Set the set of sleeping elements for Alg^^^Q to be St = {(i, 1 — Xt{i)) \ i = 1,2,..., n}. 

5; Obtain an action Vt GV hy running Alg^sco such that Vt O S't = 0. 

6 : Set yt = 1[0 i Vt]. 

7: Predict yt, and receive true label yt- 

8: In algorithm Alg^^.^, set the loss of the awake elements e G U \ St as follows: 


4(e) 



njl-yt) 

n+1 


if e 7 ^ 0 
if e = 0. 


9 ; end for 


1. (Heaviness) Any action V GTA has at least n+1 elements in Us- 

2. (Richness) For all (si,..., s„+i) G {0,1,*}" x {0,1}, the action {(1, si), (2, S 2 ), ■ ■ ■ ,{n, s„), s„+i} G 
Us is in TA. 

We now show how to use the above definition of hard instances to prove the hardness of an online sleeping 
combinatorial optimization (OSCO) problem by reducing from the online agnostic learning of disjunction 
(OALD) problem. At a high level, the reduction works as follows. Given an instance of the OALD problem, 
we construct a specific instance of the the OSCO and a sequence of losses and availabilities based on the 
input to the OALD problem. This reduction has the property that for any disjunction, there is a special 
set of actions of size n + 1 such that (a) exactly one action is available in any round and (b) the loss of 
this action exactly equals the loss of the disjunction on the current input example. Furthermore, the action 
chosen by the OSCO can be converted into a prediction in the OALD problem with only lesser or equal loss. 
These two facts imply that the regret of the OALD algorithm is at most n + 1 times the per-action regret 
of the OSCO algorithm. 

Theorem 1. Consider an online sleeping combinatorial optimization problem such that for any positive 
integer n, there is a hard instance with parameter n of the problem. Suppose there is an algorithm 
that for any instance of the problem with ground set U of size d, runs in time poly(T, d) and has regret 
bounded by poly{d) ■ for some S G (0,1). Then, there exists an algorithm Alg^i^j for online agnostic 

learning of disjunctions over n variables with running time poly{T,n) and regret poly{n) ■ . 

Proof. Algjijj is given in Algorithm First, we note that in each round t, we have 

4(Pt) > i[yt^U (2) 

We prove this separately for two different cases; in both cases, the inequality follows from the heaviness 
property, i.e., the fact that IVt] > n + 1. 

1. If 0 ^ Vt, then the prediction of Alg^ji^j is yt = 1, and thus 

4(Pt) = l^tl • ^ > 1-yt = l[yt7^yt]- 

n + 1 

2. If 0 G Vj, then the prediction of Alg^i^j is yt = 0, and thus 

4(Pt) = {\Vt\ - 1) ■ + (^yt - > Vt = l[yt7^yt]. 
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Note that if Vt satisfies the equality \Vt \ = n + 1, then we have an equality £t{Vt) = l[yt yt]', this property 
will be useful later. 

Next, let 4> be an arbitrary disjunction, and let <12 < • • • < be its relevant indices sorted in 
increasing order. Define : {1,2,..., m} —)■ {0,1} as f^{j) ■= l[ij is a positive index for (f)\, and define the 
set of elements Wff, := {(i,*) | i is an irrelevant index for (f)}. Finally, let = {V^ , ,..., be the 

set of TO + 1 actions where for j = 1, 2, ..., to, we define 

■= {(*^,1 - Ui^)) I 1 < -^ < j} U u {{it,-k) \ 3 < £ { 1 }, 

and 

^7+' := {{ii, 1 - M^)) \ l<£<m}UW^U {0}. 

The actions in Vip are indeed in the decision set V due to the richness property. 

We claim that contains exactly one awake action in every round and the awake action contains the 
element 1 if and only if 0(x() = 1. First, we prove uniqueness: if and (where j < k) are both awake 
in the same round, then {ij, f^{j)) G and {ij, 1 — f^iij)) G are both awake elements, contradicting our 
choice of St- To prove the rest of the claim, we consider two cases: 

1. If ^(xt) = 1, then there is at least one j G (1,2,..., to} such that Xt(ij) = Let f be the smallest 

• / •/ 

such j. Then, by construction, the set is awake at time t, and 1 S , as required. 

2. If 4>{xt) = 0, then for all j G (1,2,..., to} we must have Xt{ij) = 1 — Then, by construction, the 

set is awake at time t, and 0 G , as required. 

Since every action in has exactly n + I elements, and if V is awake action in at time t, we just 
showed that 1 G V ii and only if = 1, exactly the same argument as in the beginning of this proof 

implies that 

£tiV) = l[yt ^ «!>(xt)]. (3) 

Furthermore, since exactly one action in is awake every round, we have 

T 

= E E (4) 

t=i t -. v&At 

Finally, we can bound the regret of algorithm (denoted Regret^t in terms of the regret of algorithm 

Algosco (denoted Regret“'^°) as follows: 

T 

RegTetp{(l))='^l[yt^yt]-l[(l){xt)^yt]< E E ^t{Vt) - £t{V) 

t=i t-.veAt 

= E R-egret““(F) < \V^\ ■ poly(d) • = poly(n) • T^~^, 

The first inequality follows by ([^ and Q, and the last equation since < n + 1 and d < poly(n). □ 

4.1 Hardness results for Policy Regret and Ranking Regret 

It is easy to see that our technique for proving hardness easily extends to ranking regret (and therefore, 
policy regret). The reduction simply uses any algorithm for minimizing ranking regret in Algorithm as 
Algo 5 (.o. This is because in the proof of Theorem the set has the property that exactly one action 
Vt G T )^ is awake in any round t , and £ tiVt ) = \[yt ^ yt ]. Thus, if we consider a ranking where the actions in 
are ranked at the top positions (in arbitrary order), the loss of this ranking exactly equals the number of 
errors made by the disjunction (j) on the input sequence. The same arguments as in the proof of Theorem 
then imply that the regret of Alg^jj^j is bounded by that of Alg^^^Q, implying the hardness result. 
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5 Hard Instances for Specific Problems 

Now we apply Theorem to prove that many online sleeping combinatorial optimization problems are as 
hard as PAC learning DNF expressions by constructing hard instances for them. Note that all these problems 
admit efficient no-regret algorithms in the non-sleeping setting. 


5.1 Online Shortest Path Problem 


In the Online Shortest Path problem, the learner is given a directed graph G = {V, E) and designated 
source and sink vertices s and t. The ground set is the set of edges, i.e. U = E, and the decision set V is the 
set of all paths from s to t. The sleeping version of this problem has been called the Online Sabotaged 
Shortest Path problem by Koolen et al. 2015 , who posed the open question of whether it admits an 


efficient no-regret algorithm. For any n € N, a hard instance is the graph shown in Figure]^ It has 
3n + 2 edges that are labeled by the elements of ground set U = Ur=i{(*>0)) (b l)j (b*)}U{0j 1}) as required. 
Now note that any s-t path in this graph has length exactly n + 1, so T> satisfies the heaviness property. 
Furthermore, the richness property is clearly satisfied, since for any s G {0,1,*}" x {0,1}, the set of edges 
{(1, Si), (2, S 2 ),..., (n, Sn), Sn+i} is an s-t path and therefore in T). 


5.2 Online Minimum Spanning Tree Problem 

In the Online Minimum Spanning Tree problem, the learner is given a fixed graph G = {V,E). The 
ground set here is the set of edges, i.e. U = E, and the decision set V is the set of spanning trees in the 
graph. For any n G N, a hard instance is the same graph G*-"^ shown in Figure except that the edges are 
undirected. Note that the spanning trees in G*^"^ are exactly the paths from s to t. The hardness of this 
problem immediately follows from the hardness of the Online Shortest Paths problem. 


5.3 Online fc-Subsets Problem 

In the Online A:-Subsets problem, the learner is given a fixed ground set of elements U. The decision set 
T> is the set of subsets of U of size k. For any n G N, we construct a hard instance with parameter n of the 
Online fc-SuBSETS problem with k = n 1 and d = 3n 2. The set V of all subsets of size fc = n -I-1 of a 
ground set U of size d = 3n 2 clearly satisfies both the heaviness and richness properties. 


5.4 Online fc-Truncated Permutations Problem 

In the Online /c-truncated Permutations problem (also called the Online fc -ranking problem), the 
learner is given a complete bipartite graph with k nodes on one side and m> k nodes on the other, and the 
ground set U is the set of all edges; thus d = km. The decision set V is the set of all maximal matchings, 
which can be interpreted as truncated permutations of k out of m objects. For any n G N, we construct 
a hard instance with parameter n of the Online A:-Truncated Permutations problem with k = n 1, 
m = 3n 2 and d = km = (n l)(3n -I- 2). Let L = {ui,M 2 i ■ ■ • be the nodes on the left side of 

the bipartite graph, and since m = 3n -I- 2, let i? = | i = 1,2,... ,n} U {no,'Ci} denote the 

nodes on the right side of the graph. The ground set U consists of all d = km = [n l)(3n -|- 2) edges 
joining nodes in L to nodes in R. We now specify the special 3n -|- 2 elements of the ground set U: for 
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Figure 3: Graph This is a complete bipartite graph as described in the text, but only the special 

labeled edges shown for clarity. 



Figure 4: Graph for the Online Bipartite Matching problem. 


i = 1, 2,..., n, label the edges (ui, Vi^), {ui, {ui, by (i, 0), (j, 1), {i, *) respectively. Finally, label the 
edges (u„+i,uo), (un+i,vi) by 0 and 1 respectively. The resulting bipartite graph is shown in Figure]^ 
where only the special labeled edges are shown for clarity. 

Now note that any maximal matching in this graph has exactly n + 1 edges, so the heaviness condition 
is satisfied. Furthermore, the richness property is satisfied, since for any s G {0,1,*}" x {0,1}, the set of 
edges {(1, Si), (2, S 2 ), ■ • •, (r, Sn), s„+i} is a maximal matching and therefore in V. 

5.5 Online Bipartite Matching Problem 

In the Online Bipartite Matching problem, the learner is given a fixed bipartite graph G = {V,E). The 
ground set here is the set of edges, i.e. U = E, and the decision set T) is the set of maximal matchings in 
G. For any n G N, a hard instance with parameter n is the graph shown in Figure]^ It has 3n + 2 

edges that are labeled by the elements of ground set U = Ur=i{(b 0), (*, 1), (*, *)} U {0,1}, as required. Now 
note that any maximal matching in this graph has size exactly n + 1, so I? satisfies the heaviness property. 
Furthermore, the richness property is clearly satisfied, since for any s G {0,1,*}" x {0,1}, the set of edges 
{(1, si), (2, S 2 ), ■ ■ ■, (n, Sn), Sn+i} is a maximal matching and therefore in V. 

5.6 Online Minimum Cut Problem 

In the Online Minimum Cut problem the learner is given a fixed graph G = (V, E) with a designated pair 
of vertices s and t. The ground set here is the set of edges, i.e. U = E, and the decision set V is the set of 
cuts separating s and t: a cut here is a set of edges that when removed from the graph disconnects s from 
t. For any n G N, a hard instance is the graph shown in Figure It has 3n + 2 edges that are labeled 
by the elements of ground set U = Ur=i{(*,0), (*, 1), (*,*)} 1}, required. Now note that any cut in 

this graph has size at least n + 1, so V satisfies the heaviness property. Furthermore, the richness property 
is clearly satisfied, since for any s G {0,1,*}" x {0,1}, the set of edges {(1, si), (2, S 2 ),..., (n, s„), Sn+i} is a 
cut and therefore in V. 


6 Conclusion 

In this paper we showed that obtaining an efficient no-regret algorithm for sleeping versions of several natural 
online combinatorial optimization problems is as hard as efficiently PAG learning DNF expressions, a long¬ 
standing open problem. Our reduction technique requires only very modest conditions for hard instances 
of the problem of interest, and in fact is considerably more flexible than the specific form presented in this 
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Figure 5: Graph for the Online Minimum Cut problem. 

paper. We believe that almost any natural combinatorial optimization problem that includes instances with 
exponentially many solutions will be a hard problem in its online sleeping variant. Furthermore, our hardness 
result is via stochastic i.i.d. availabilities and losses, a rather benign form of adversary. This suggests that 
obtaining sublinear per-action regret is perhaps a rather hard objective, and suggests that to obtain efficient 
algorithms we might need to either (a) make suitable simplihcations of the regret criterion or (b) restrict the 
adversary’s power. 
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